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For certain quantum architectures and algorithms, most of the required resources are consumed 
during the distillation of one-qubit magic states for use in performing Toffoh gates. 1 show that the 
overhead for magic-state distillation can be reduced by merging distillation with the implementation 
of Toffoh gates. The resulting routine distills 8 one-qubit magic states directly to a Toffoli state, 
which can be used without further magic to perform a Toffoli gate. 
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Quantum algorithms frequently include a reversible 
classical subroutine that dominates the computation. 
Consequently, the Toffoli gate, which is universal for clas- 
sical reversible computing, is commonly the most-used 
gate in an algorithm. Toffoli gates are inconvenient in 
many quantum architectures, but they can be imple- 
mented using, for example, one-qubit magic states and 
Clifford gates, where the Clifford gates are taken here 
to include both unitary Clifford operators and measure- 
ment and preparation in Pauli eigenbases. The initial 
preparation of magic states is generally poor, so prior 
to use, it is necessary to distill them, increasing their fi- 
delity with the intended state. Magic-state distillation 
can be a significant burden. Recent quantum architec- 
ture papers indicate that when running Shor's algorithm 
on interesting problem sizes 90% of the physical qubits 
can easily be devoted to magic-state distillation [ , ]. 
Such observations have helped to spur a significant body 
of new work focused on reducing the resources required 
by magic-state distillation routines [-'-s]. 

With a few exceptions ["-11], past research on magic- 
state distillation has focused on routines that transform 
multiple faulty copies of a magic state into fewer im- 
proved copies of the same state. Being simple to inject, 
one-qubit magic states are a natural starting point for 
distillation, and routines that output the same sort of 
state as the input are convenient. Ultimately, however, 
the reason for distilling magic states is frequently to im- 
plement a Toffoli gate. Rather than segregating the two 
tasks, I show in this paper that one can combine them to 
obtain reductions in the resources required to implement 
a Toffoli gate. 

I describe here a novel magic-state distillation routine, 
the i/-to- Toffoli routine, that takes 8 copies of the one- 
qubit magic state \H) — cos(|)|0) +sin(|)|l) that suffer 
Y errors with probability p and, on success, outputs a 
single Toffoli state, jToffoli), that suffers errors with a 
probability of roughly 28p^. One measure of the effi- 
ciency of a distillation routine is the state cost, that is, 
the number of input copies of the magic state required 
per improved output. Using the iJ-to- Toffoli routine, the 
state cost for implementing a Toffoli gate with quadrati- 
cally reduced error is competitive with the most efficient 
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FIG. 1. Decompositions of the Margolus- Toffoli gate. (See 
Ref. [i->].) On the left is a decomposition of the Margolus- 
Toffoli gate in terms of a true Toffoli gate, a controUed- 
controUed-sign i^'^Z) gate, and a controlled-sign {'~'Z) gate. 
Note that the Margolus- Toffoli gate is equivalent to a Toffoli 
gate followed (or preceded) by the transformation |101) — >■ 
— jlOl). On the right is a decomposition in terms of Clifford 
gates and 7r/4 rotations about the Y axis of the Bloch sphere. 



distillation routines known [ , ]. Moreover, the location 
cost, which I define as the number of locations in the dis- 
tillation circuit per output, is smaller by a factor of 2 to 
4 for the same task. 

Aside from the aforementioned differences, the scenario 
considered here is the standard one for magic state dis- 
tillation [12]: Clifford gates are taken to be perfect while 
magic states are assumed to suffer from a limited (by 
twirling) set of errors. The notation largely follows that 
of Meier et al. [. >]. 

The remainder of this paper is organized as follows: 
Section I introduces the i?-to- Toffoli routine, and Sec. II 
explains how a related approach can be used for distilling 
Toffoli states. The efficiency of the i7-to- Toffoli routine 
and its relative performance are discussed in Sec. III. The 
conclusion appears in Sec. IV, and the circuits used in the 
calculation of location costs are given in the appendix. 



I. i/-TO-TOFFOLI DISTILLATION 

At the heart of the new distillation routine are two ob- 
servations: First, the standard circuit for the Margolus- 
Toffoli gate (shown in Fig. 1), when implemented using 
twirled faulty \H) states (see Fig. 2), is equivalent to a 
perfect Margolous- Toffoli gate potentially followed by Y 
errors on the target and Z errors on the controls, and fur- 
thermore, an error on a single \H) state always results in 
a Y error on the target. Second, given Margolus- Toffoli 
gates that occasionally suffer from Y (or X) errors on the 
target one can use several such gates to prepare a Toffoli 
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FIG. 2. Circuit identity showing how the rotation V'(ip) = 
cos(^)/ =p isin(^)y can be implemented using Clifford gates, 
the state \H), and the (Chfford) rotation Y{^). 
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state with multiple target qubits and then check the tar- 
get qubits against each other to reduce the probability of 
an undetected error on the prepared TofFoli state. Veri- 
fied TofToli states can then be used to implement TofToli 
gates using the indirect method of Shor (shown in Fig. 3), 
or they can be further checked against each other. 

The iJ-to-Toffoli magic-state distillation routine is 
shown in Fig. 4. Both Margolus-Toffoli gates use the 
same control qubits, but they have different targets. On 
such input the Margolus-Toffoli gate acts like the Tof- 
foli gate (see Fig. 5), a classical reversible gate. Con- 
sequently, in the absence of errors, measuring the two 
target qubits in the Z-eigenbasis would yield the same 
result, and thus the parity measurement will yield 0. It 
is straightforward to show that an error on an \H) state 
used to implement the Margolus-Toffoli gate as shown in 
Figs. 1 and 2 can be propagated to a y error on the tar- 
get qubit together with, possibly, Z errors on the control 
qubits. A single such error will thus be detected by the 
parity measurement while any two errors will go unde- 
tected. To lowest non-trivial order the acceptance proba- 
bility a{p) is thus l~8p and the output error probability 
e{p) is (2)^^ — 28p^. Exhaustive counting yields 

a(p) =1 - 8p + 56p^ - 22 V + 560p'^ 

- 896/ + 896/ - 512p^ + 128p^ and 
e{p)a{p) =28p2 - 168/ + 476/ 

- 784p5 + 784p6 - 448p^ + lUp^ . 

Conditional on acceptance, only Z errors afflict the out- 
put control qubits, while the output target qubit is af- 
flicted only by X errors. As it happens, each of the seven 
possible non-trivial errors is equally likely. 

The probability of an undetected X error on the output 
target qubit can be made arbitrarily small by generating 
more target qubits and checking their parities; the X- 
error probability can be reduced to 0{p°) by generating 
o target qubits and checking them against one another. 
This does not reduce the probability of a Z error on the 
output control qubits below 0(/); in fact, the coefficient 
of p^ worsens as o becomes larger. If further reductions 
in the probability of error on the control qubits are nec- 
essary, one can resort to generic Toffoli-state distillation. 



FIG. 3. A circuit implementing the TofToli gate using a Toffoli 
state, ITofToli) = (|000) + \W0) + |010) + llll))/2, as per 
Shor [1 I]. The target of the TofToli gate corresponds to the 
third qubit in each block. 



II. TOFFOLI-STATE DISTILLATION 

For Toffoli-state distillation, it is helpful to restrict the 
errors that must be considered to Z errors on the control 
qubits and X errors on the target qubit. Conveniently, 
the iJ-to- Toffoli routine outputs states with errors of just 
this form. Nevertheless, should it be necessary, Aliferis 
has shown that the desired error model can be enforced 
by twirling the Toffoli state with the appropriate set of 
Clifford gates [ I ^] . 

Toffoli states that suffer only X errors on the target 
qubit and Z errors on the control qubits can be used to 
implement Toffoli gates that suffer only X errors on the 
target and Z errors on the (matching) controls. Conse- 
quently, a quadratic reduction in the probability of an er- 
ror on the target qubit of such a state can be achieved by 
implementing the left circuit of Fig. 4 using TofFoli states. 
This distillation routine can be shown to be equivalent 
to the Toffoli-state distillation routine proposed by Al- 
iferis [ ■ ]. Given identically prepared Toffoli states, the 
probability of an X error on the target is reduced from 
p to roughly p^ and the probability of errors that do not 
involve an X error on the target roughly doubles. Re- 
duction of the Z-crror probability on a control qubit can 
be achieved using the same circuit if one first swaps the 
target qubit with the control qubit of the Toffoli state 
using a pair of Hadamard gates (see Fig. 5) . This trans- 
formation takes Z errors on the former control qubit to 
X errors on the new target qubit and vice versa. 



III. EFFICIENCY 

The _ff-to- Toffoli routine is atypical of magic-state dis- 
tillation routines in that its inputs and outputs are dif- 
ferent and, as a consequence, it is not composable with 
itself. This complicates comparisons with other distilla- 
tion routines. Taking the Toffoli state to have a "value" 
of 4 \H) states, it might be said that the i/-to- Toffoli rou- 
tine costs 2 input \H) states per output \H) state with 
quadratically reduced error probability, numbers which 
correspond to a scaling exponent of log2 2 = 1. This 
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FIG. 4. _ff-to-Toffoli distillation circuit. The output is discarded whenever a non-trivial measurement outcome is obtained. The 
circuit on the left shows the distillation in terms of Toffoli gates while the circuit on the right shows the same distillation circuit 
expanded in terms of the |H)-state implementation of Margolus- Toffoli gates. All Y{^) gates are implemented indirectly as 
in Fig. 2. By enumeration and error propagation it is easily shown that, to lowest non-trivial order, this circuit takes \H) states 
that suffer Y errors with probability p to Toffoli states that suffer errors (some combination of X errors on the target qubit 
and Z errors on the controls) with probability 28p^. 
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FIG. 5. Toffoli state preparation. The circuit on the left shows 
an obvious method of preparing the Toffoli state. The middle 
circuit shows Toffoli-state preparation using the Margolus- 
Toffoli gate (decomposed into three more familiar gates). 
From the right circuit it is clear that the target qubit of a 
Toffoli state can be changed using a pair of Hadamard gates. 
The first equality follows from the fact that a gate controlled 
on |0) is not executed; the same logic implies that a Margolus- 
Toffoli can be substituted for a true Toffoli gate whenever the 
target qubit is initially prepared in the state |0), as will be 
the case whenever the Margolus-Toffoli gate is used in the 
distillation routines presented here. 
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FIG. 6. Toffoli-state preparation circuit based on the 
Margolus-Toffoli gate. This circuit can be used, together with 
circuits from Figs. 3 and 2, to implement a Toffoli gate using 
only 4 \H) states, as opposed to 7 \H) states as commonly 
assumed in the literature. 
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would make it competitive with the most efficient distil- 
lation routines [ , , ] , but the comparison is unfair in two 
ways: The output is not really a collection of \H) states, 
and the routine is not scalable, being fixed in size and 
non-composable. For this reason, I consider below the 
overhead required for specific tasks: the production of a 
Toffoli state or Toffoli gate using faulty \H) states, where 
the (|iJ)-state) inputs and (state or gate) outputs suffer 
errors with probability p and O(p^), respectively. 

Using the i?-to- Toffoli routine, 8 \H)-type magic states 
which suffer Y errors with probability p are required to 
distill a single Toffoli state which suffers errors with prob- 
ability 0{p'^). As illustrated in Fig. 3, Clifford gates and 
a Toffoli state suffice to perform a Toffoli gate, so a Toffoli 
gate can be implemented with the same parameters. I as- 
sume in this analysis that all Toffoli gates are performed 
using Toffoli states. For distillation routines other than 
the i7-to- Toffoli routine, \H) states are distilled prior to 
being used in the Toffoli-state preparation circuit shown 
in Fig. 6. Using this circuit, only 4 \H) states are required 
to prepare each Toffoli state and therefore to implement 
each Toffoli gate. Bravyi and Haah have shown that \H) 
states with quadratically suppressed errors can be pre- 
pared at a cost arbitrarily close to 3 input \H) states 



FIG. 7. (Color online) State injection. The circuit shown 
injects an arbitrary state |^') into a quantum code. The 
unshaded portion of the circuit is implemented on encoded 
qubits, while the shaded gates are performed on unencoded 
qubits. The gate D represents decoding the quantum code. 



per output \H) state [7]. Jones has further shown that 
quadratic error suppression can be obtained at a state 
cost arbitrarily close to 2 as part of a larger distillation 
routine [ ]. Multiplying each of these numbers by 4, one 
finds that the iJ-to- Toffoli routine yields an improvement 
of 33% in the state cost compared to the best routines of 
Bravyi and Haah and performs similarly to Jones' rou- 
tines. 

In addition to the state cost, I calculate location costs 
for the iJ-to- Toffoli distillation routine and some close 
competitors. Locations are simply points in (discretized) 
space and time where a qubit is undergoing a gate or 
storing quantum information. The location cost of a dis- 
tillation routine is the number of locations required per 
output. In an effort to make the cost less dependent 
on the native gate set, one-qubit unitary Clifford gates 
are ignored when counting locations; one-qubit Clifford 
measurements are also ignored on the grounds that these 
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Output error probability 


ioiioli State 


iottoli gate 


lO-to-2 


20 


36/ 


183 


298 


14-to-2 


28 


28p^ 


179 


334 


26-to-6 


17.33 


76p^ 


151 


252.7 


J/-to-ToffoIi 


8 


28p^ 


36 


91 



TABLE L Properties of TofToli-gate and Toffoli-state implementations based on a selection of distillation routines. The 14-to-2 
and 26-to-6 routines are members of a family of routines developed by Bravyi and Haah [7] . Within that family the 26-to-6 
routine seems to have the lowest location cost. The lO-to-2 routine is a relatively efficient routine proposed in Ref. [-i]. The 
circuits on which the location counts for these distillation routines are based appear in the appendix. The location costs quoted 
for the Toffoli gate include both the circuit in Fig. 3 and the cost of state injection. 



tend to be much faster than two-qubit CUfFord gates on 
robustly encoded qubits. 

As illustrated by Fig. 8 in the appendix, the number 
of locations in the iJ-to-Toffoli distillation routine is 36. 
The number of additional locations required to imple- 
ment the Toffoli gate as shown in Fig. 3 using the re- 
sultant ToffoU state is 15. The total resource cost for 
implementing a Toffoli gate is thus 51 locations and 8 
\H) states. If each \H) state is injected from a lower 
level of encoding using the circuit in Fig. 7 then 5 addi- 
tional locations are required per \H) state and the total 
location cost rises to 91 locations per Toffoli gate. By 
comparison, the appendix contains a compacted circuit 
for the 26-to-6 routine that requires only 53.7 locations 
per improved \H) state counting state injection and 32 
discounting it. The circuit in Fig. 6 requires 23 loca- 
tions and 4 \H) states. Thus, using the 26-to-6 routine, 
151 locations are required to produce a Toffoli state with 
quadratically reduced error and 252.7, including state in- 
jection, for a similarly improved Toffoli gate. The H-to- 
Toffoli routine reduces the location cost for Toffoli states 
and gates by a factor of 4.2 and 2.8, respectively. State 
injection is included in the latter number as an indica- 
tion of the true cost of a Toffoli gate. This represents 
the lower end of the possible location overhead per input 
\H) state; \H) states having previously undergone distil- 
lation will cost many more locations, thereby enhancing 
the relative performance of the i?-to-Toffoli routine. 

The properties of Toffoli-state and Toffoli-gate imple- 
mentations using some of the most resource-efhcient dis- 
tillation routines are summarized in Tab. L 



IV. CONCLUSION 

Magic-state distillation, as required for the implemen- 
tation of robust Toffoli gates, is a significant driver of 
overhead in some proposed quantum computing architec- 
tures. In this paper I have shown that the overhead can 
be reduced by merging magic-state distillation with the 
implementation of ToffoH gates. I described a novel dis- 
tillation routine that distills 8 copies of a one-qubit magic 
state, \H), directly into a Toffoli state with quadratically 
reduced error probability. For the purpose of implement- 



ing Toffoli gates, the state cost of this routine is as good 
as or better than existing routines and the location cost 
is a factor of 2 to 4 lower compared to other distillation 
routines that provide quadratic error suppression. 

Subsequent to the development of the iJ-to- Toffoli rou- 
tine described herein, I learned that Cody Jones has de- 
veloped a closely related routine, albeit one with a dis- 
tinct motivation and a very different looking quantum 
circuit. Jones' routine likewise distills 8 copies of a one- 
qubit magic state into a single Toffoli state and provides 
the same amount of error suppression. More information 
on this complementary work can be found in Ref. [ ] . 

My analysis has focused on a single round of magic- 
state distillation. Partly this reflects the fact that the 
proposed routine outputs Toffoli states, for which few 
distillation routines are known, and partly it reflects a 
belief that many rounds of distillation at some fixed 
level of encoding will be uncommon, as has previously 
been argued in Refs. [7, 17]. Distillation routines that 
provide a quadratic reduction in error per round are 
among the most efficient [ ], and it is unnecessary to 
perform distillation using encoded gates that are signifi- 
cantly less error prone that the outputs of the distillation 
routine [5, 7, 17], so it seems likely that maximum effi- 
ciency will typically be achieved by doing magic-state 
distillation at many different levels of encoding, where a 
quadratic reduction in the error probability is achieved 
at each level. As illustrated in the recent architectural 
paper of Fowler et al. [2], surface codes are well suited 
to this approach, being very flexible in the degree of en- 
coding. For concatenated coding, the steps in the degree 
of encoding will typically be greater, so routines with 
stronger error suppression may play a role, but given the 
difference in the sizes of the coefficients for the output 
error probabilities for magic-state distillation as opposed 
to fault-tolerant error correction, I would not expect very 
high-order distillation to be necessary. 

I have introduced the notion of location cost because 
I feel that this more accurately identifies the resource 
that one wishes to minimize in order to control the over- 
head for fault-tolerant quantum computing. In the limit 
of many rounds of distillation, the state cost becomes 
the determining factor in the location cost, but when 
only a few rounds of distillation are required the two can 
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yield significantly different rankings of distillation rou- 
tines. The location cost is not a particularly portable 
measure; the best method of counting locations will vary 
between different architectures, as is illustrated by a re- 
cent paper optimizing the location cost of the 15-to-l 
magic-state distillation routine for a surface code archi- 
tecture [jS]. Additionally, while I have focused on the 
implementation of the Toffoli gate, it is perhaps worth 
mentioning that the Margolus- Toffoli gate can be im- 
plemented at a slightly lower location cost for some of 
the distillation routines considered here using the circuit 
in Fig 1. As has been previously noted, the Margolus- 
Toffoli gate is often an acceptable stand-in for the Toffoli 
gate [19]; in particular, the Margolus- Toffoli gate can be 
used in a quantum computation to calculate any classical 
function that is subsequently exactly undone, an occur- 
rence not uncommon in reversible computing. 

On the topic of magic-state distillation, several inter- 
esting open problems remain. Thus far, Toffoli-state dis- 
tillation has received little attention. The only published 
routine has a state cost of 8 for quadratic error suppres- 
sion [15], though a more efficient method was recently 
suggested by Jones in the conclusion of Ref. [1]. An ob- 
vious avenue of investigation is thus to search for new 



routines distilling Toffoli states. The compaction of exist- 
ing distillation routines to minimize their location costs 
in various architectural settings is another worthwhile en- 
deavor. Aside from magic-state distillation, other (older) 
techniques exist for implementing non-Clifford gates in 
concatenated codes [L4] and there even exist quantum 
codes, including some varieties of surface code, that re- 
quire no distillation at all [ - i, J ;]. One might therefore 
investigate the performance tradeoff between magic-state 
distillation and these other techniques for implement- 
ing fault-tolerant non-Clifford gates. Finally, despite the 
caveats listed above, it would be interesting to determine 
the location cost as a function of the target error rate for 
a variety of distillation sequences under the assumption 
of perfect Clifford gates. 
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FIG. 8. jy-to-Toffoli distillation circuit, expressed using exclusively Clifford gates and \H) states. In total, this circuit uses 8 
\H) states and 36 locations to distill a single Toffoli-state output. For the purpose of counting locations, measurements and 
unitary one-qubit Clifford gates are ignored. 
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FIG. 9. (Color online) Compacted circuit for the 14-to-2 distillation routine [7] expressed in terms of Clifford gates and je"'^'* 

states. The gates in the highlighted region act on every pair of unmeasured qubits. Depending on the Z-measurement 
outcomes, some subset of the highlighted gates are implemented. The angles 6i and 82 are multiples of n/2, where the multiple 
is likewise dependent on the Z-measurement outcomes. The output is discarded whenever the outcome of any of the remaining 
measurements is non-trivial. Using standard circuit identities, it can be shown that any distillation routine of the sort proposed 
by Bravyi and Haah in Ref. [, ] can be expressed in this form, though efficiency- wise it is not always desirable. In total, this 



circuit uses 14 



states and at most 78 locations to distill 2 improved copies of e' 



r/4 



Conditional on success, the 



marginal probability of error is reduced from p to roughly 7p . For the purpose of counting locations, measurements and 



unitary one-qubit Clifford gates are ignored. Note that je"^^'' 
and \H) are equivalent for the purpose of location counting. 



(|0) + e'^/^ll))/^ = e-''''^HZ{=^)\H) so the states e' 



i7r/4 \ 
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Appendix: Location counting circuits 

The location costs quoted for distillation routines in 
this paper were obtained using the circuits shown in 
Figs. 8, 9, 10, and 11. 
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FIG. 11. Circuit for the lO-to-2 magic-state distillation routine [ ] expressed in terms of Clifford gates and \H) states. The 
output is discarded whenever a non-trivial measurement outcome is obtained. In total, this circuit uses 10 \H) states and 80 
locations to distill 2 improved copies of \H). Conditional on success, the marginal probability of error is reduced from p to 
roughly 9p^. For the purpose of counting locations, measurements and unitary one-qubit Clifford gates are ignored. 



